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SUMMARY 


The  present  paper  proposes  a research  paradigm  for  comparing 
weight  estimates  to  empirically  derived  "true"  weights,  thus  obtaining 
a measure  of  the  criterion  validity  of  different  weight  estimation 
techniques.  Subjects  are  first  taught  a multi -attribute  utility  (MAU) 
model  via  multiple-cue  probability  learning  (MCPL)  and  outcome  feedback. 
Then,  various  assessments  of  the  importance  weight  parameters  for  the 
model  attributes  are  obtained.  Composites  formed  from  these  weights 
are  subsequently  compared  to  composites  formed  from  optimal  statistical 
weights  derived  from  outcome  feedback. 

Data  are  reported  from  17  subjects  who  were  taught  one  of  three 
"diamond  worth"  MAU  models  in  100  feedback  trials.  The  models  all  in- 
volved four  attributes  (cut,  color,  clarity,  and  carat  weight),  and 
varied  in  the  "environmental  correlations"  among  the  dimensions  (either 
(1)  all  uncorrelated,  (2)  one  large  positive  correlation,  or  (3)  two 
large  negative  correlations).  In  addition  to  the  usual  MCPL  indices 
of  consistency,  achievement,  and  matching,  pseudo-matching  correlations 
were  computed  for  weights  elicited  via  the  direct  subjective  proce- 
dures of  ranking  and  ratio  estimation,  the  indifference  procedures  of 
pricing  out  and  trading  off  to  the  most  important  dimension,  and 
regression  weights  derived  from  subjective  estimates  of  the  validity 
coefficients.  Overall,  the  composites  formed  from  the  subjects' 
elicited  weights  closely  corresponded  to  the  "true"  weight  composites. 

In  addition,  a high  degree  of  correspondence  was  demonstrated  among 
all  of  the  assessed  weighting  schemes.  Individual  differences  are  also 
reported. 


i 


The  results  of  the  present  study  are  discussed  from  both  an 
applied  and  theoretical  perspective.  To  the  decision  analyst  in  the 
field,  the  present  results  give  support  to  the  belief  that  the 
parameter  estimates  obtained  from  clients  define  a "true"  normative 
preference  function.  Theoretically,  the  findings  of  this  study  are 


strong  evidence  that  people  are  aware  of  their  cognitive  processes. 
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Introduction 


After  several  years  of  research  on  both  subjective  weights  and 
statistical  weights,  considerable  controversy  over  issues  of  validity 
exists.  Although  the  literature  a decade  ago  suggested  that  subjective 
weights  were  usually  poor  (Slovic  & Lichtenstein,  1971),  recent  research 
has  not  confirmed  this.  On  the  contrary,  many  studies  demonstrating 
the  convergent  and  criterion  validity  of  subjective  weights  have 
appeared  (John  & Edwards,  Note  1).  However,  influential  papers  in  the 
field  continue  to  cite  the  old  view  that  subjects'  subjective  estimates 
of  attribute  importance  bear  little  relationship  to  reality  (e.g., 

Nisbett  & Wilson,  1977). 

One  of  the  strongest  recent  findings  is  that  of  Schmitt  (1978). 

He  taught  his  subjects  a riskless,  additive  multi -attribute  utility 
function  via  outcome  feedback  in  a multiple-cue  probability  learning 
(MCPL)  setting.  Obtaining  least-squares  regression  weights  and  three 
different  sets  of  subjective  weights,  Schmitt  compared  the  composites 
derived  from  these  weights  to  those  resulting  from  the  "true"  regression 
weights  used  to  generate  the  outcome  feedback.  He  found  that  there 
were  absolutely  no  differences  between  the  matching  indices  (correlations 
between  composites  formed  from  "true"  weights  and  from  subjects'  weights) 
across  the  four  sets  of  obtained  weights.  Thus,  Schmitt  produced  hard 
evidence  supporting  the  accuracy  of  subjective  weights. 

Two  problems  with  Schmitt's  study  deserve  mention.  First,  large 
positive  intercorrelations  between  attributes  were  present  in  all 
conditions  of  the  experiment.  In  the  face  of  such  serious  multi- 


colllnearity  problems,  the  least-squares  regression  weights  are 
suspect.  While  the  "true"  validity  coefficients  were  all  moderately 
positive  (ranging  from  .42  to  .53),  the  "true"  regression  weights 
were  non-uniform,  and  included  some  negative  regression  weights  (e.g., 

.63,  -.15,  .16,  .40  for  the  four-attribute  problem).  In  addition  to 
the  problem  of  determining  the  "true"  wieghts,  the  high  multi-collinearity 
presents  an  even  more  serious  problem.  Large  positive  intercorrelations 
among  dimensions  imply  that  all  weighting  schemes  will  yield  highly 
convergent  composites.  Thus,  one  is  led  to  suspect  that  Schmitt  (1978) 
would  have  had  difficulty  separating  good  weights  from  poor  ones,  even 
if  he  had  been  able  to  identify  an  unambiguous  set  of  "true"  weights. 

Interestingly,  the  average  subjective  weights  reported  by  Schmitt 
are  markedly  uniform.  The  maximum  ratio  between  any  pair  of  weights 
was  about  two,  and  most  were  essentially  equal  weighting.  It  appears 
that  the  subjective  weights  obtained  by  Schmitt  were  closer  to  the 
validity  coefficients  than  to  the  least-squares  regression  weights. 

Although  our  study  was  designed  and  performed  independently  of 
Schmitt's,  the  two  are  natural  extensions  of  one  another.  We,  too, 
taught  subjects  a multi-attribute  utility  function  via  outcome  feedback, 
and  we  found  that  subjects  are  good  at  learning  weights. 

Subjective  weights,  as  well  as  inferred  statistical  weights,  were 
compared  to  the  "true"  weights  derived  from  the  outcome  feedback  pro- 
vided. Experiment  I is  the  first  comparison  of  subjective  and  statis- 
tical weights  to  a "true"  model  taught  under  controlled  conditions  in 
a context  free  of  interattribute  correlations.  Experiment  II  is 
unique  in  that  the  "true"  model  weights  are  determined,  not  through  a 
standard  least-squares  regression,  but  by  ridge  regression.  Also, 


Experiment  II  is  the  first  test  of  an  idea,  originally  proposed  by 
Newman  (Note  2),  for  treating  subjective  weight  estimates  as  validity 
coefficients  (and  not  as  weight  parameters). 

Extending  Newman's  basic  idea.  Experiment  II  compared  weight 
parameters  based  on  a ridge  regression  performed  on  subjective  weight 
estimates  (treated  as  validity  coefficient  estimates)  with  the  "true" 
model  weights,  derived  from  a ridge  regression  on  tK  criterion  pro- 
vided during  the  outcome  feedback  trials.  In  addition,  weight  elicita- 
tion procedures,  developed  from  the  axioms  of  multi -attribute  utility 
theory  and  not  heretofore  tested  in  the  MCPl  paradigm,  were  among  those 
employed  in  Experiment  II. 


Experiment  I 


Method 

Subjects.  Nine  undergraduate  students  at  the  University  of 
Southern  California  volunteered  for  the  experiment  in  partial  fulfill- 
ment of  course  requirements  in  Introductory  Psychology.  The  five  males 
and  four  females  received  no  other  direct  compensation  or  incentive 
beyond  class  credits.  Subjects  were  run  individually  in  sessions  lasting 
from  60  to  90  minutes. 

Training  procedures.  Each  subject  was  seated  in  front  of  a cathode 
ray  tube  (CRT)  screen.  A standardized  cover  story  was  given  to  each 
subject,  explaining  how  he/she  was  to  learn,  via  "computer  assisted 
instruction,"  the  manner  in  which  diamonds  are  appraised.  The  subject 
was  told  that  diamonds  could  be  evaluated  on  four  attributes  (cut,  color, 
clarity,  and  carats)  and  that  each  diamond  would  be  presented  as  a 
"profile"  of  ratings  (between  0 and  10)  on  each  of  the  four  attributes. 
The  ratings  were  all  related  to  some  physical  characteristic  of  the 
diamond:  cut  is  determined  from  a formula  for  combining  certain 
critical  angles  and  length-to-width  ratios  obtained  from  very  precise 
measuring  devices;  color  is  determined  by  examining  the  diamond  under 
a spectroscope;  clarity  refers  to  the  number  and  severity  of  "inclusions" 
revealed  under  a microscope;  and  carat  rating  is  related  to  the  weight 
of  the  stone,  such  that  the  smaller  the  number  the  lighter  the  stone. 
Subjects  were  informed  that  in  all  cases  higher  attribute  ratings  were 
better  than  lower  ones. 

After  an  explanation  of  how  the  training  would  proceed  and  instruc- 
tion on  how  to  operate  the  response  keyboard  connected  to  the  CRT  screen, 
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subjects  began  the  training  phase  of  the  experiment.  The  entire 
training  phase  was  controlled  by  a computer  program.  Subjects  first 
saw  a "diamond  profile,"  presented  in  the  following  format: 

CUT  COLOR  CLARITY  CARAT 

8.6  5.4  8.9  2.1 

The  program  then  prints  the  prompt  (PRICE?),  and  waits  for  the  subject 
to  estimate  the  price  of  the  diamond.  After  a number  has  been  properly 
entered  (via  the  keyboard),  the  program  informs  the  subject  of  the 
"true"  price  of  the  diamond  (outcome  feedback)  and  how  much  over  or 
under  the  estimate  is.  The  program  stores  the  subject's  response, 
clears  the  screen,  and  presents  the  next  diamond  profile.  In  all,  each 
subject  saw  100  such  diamond  profiles  and  outcome  feedback. 

MAU  model.  The  attribute  values  specified  on  the  100  diamond  pro- 
files were  generated  independently  from  a uniform  density  function  with 
endpoints  0 an  10.  Thus,  the  expected  value  of  the  mean  rating  on  each 
of  the  attributes  is  5,  and  the  expected  variance  is  about  8.3;  also, 
the  expected  value  of  the  intercorrelation  among  the  attributes  is  0. 
Since  the  same  "seed"  was  used  to  start  the  random-number  generator 
subroutine  for  each  subject,  all  subjects  saw  the  same  100  diamond  pro- 
files and  received  the  same  outcome  feedback.  Sample  means,  variances, 
attribute  intercorrelations,  validity  coefficients,  and  least-squares 
regression  weights,  based  on  the  profiles  and  feedback  provided  during 
the  100  learning  trials,  are  presented  in  Table  1.  As  is  evident,  the 
sample  means,  variances,  and  intercorrelations  of  the  four  attributes 
are  very  nearly  the  same  as  their  expected  values. 
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Table  1 

Sample  Attribute  Intercorrelations,  Validity  Coefficients, 
and  Regression  Statistics 
Experiment  I 


Attribute 

Intercorrelation 

Validity 

Coefficient 

Regression 

Statistics 

CUT 

COLOR 

CLARITY 

CARAT 

Pri  ce 

Ols 

beta  ((J)  r-fl 

CUT 

7.7 

-.16 

-.08 

-.07 

.36 

.46  .17 

COLOR 

9.1 

.01 

-.01 

.03 

.11  .00 

CLARITY 

7.4 

-.13 

.09 

.24  .02 

CARAT 

7.7 

.84 

.90  .75 

Price 

149  x 104 

4.9 

5.1 

5.0 

5.1 

4259 

The  outcome  feedback  used  to  train  the  subjects  was  generated 


from  the  following  model: 

TRUE  PRICE  - 200* CUT  + 50* COLOR  + 100* CLARITY  + 

400* CARAT  + 500  + 300*N(0,1)  (1) 

where  N(0,1)  is  normal  random  error  with  mean  0 and  variance  1.  The 

expected  value  of  the  mean  price  is  4250,  and  the  expected  total 

4 

variance  is  about  186  x 10  . Since  the  expected  error  variance  is  only 

4 

9 x 10  , the  expected  value  of  the  multiple  correlation,  Re,  is  .98 
(186-9)  / 186  ).  The  sample  values  of  the  price  mean  and  variance, 
given  in  Table  1,  are  all  quite  close  to  their  expected  values,  as  is 

A 

the  sample  value  of  R^  ( = £r  *|9=  .97).  Since  the  attribute  variances 
are  approximately  equal,  and  the  attribute  intercorrelations  are  close 
to  zero,  the  ordinary  least-squares  (OLS)  betas  given  in  Table  1 are 
roughly  proportional  to  the  attribute  weights  defined  in  Equation  1. 

Direct  subjective  weight  assessment.  After  completing  100  learning 
trials,  the  subject  was  led  into  an  adjoining  room  and  subjective  weights 
were  assessed.  Two  procedures  were  used.  First,  the  subject  was  simply 
asked  to  rank-order  the  attributes  from  most  important  to  least  impor- 
tant in  determining  overall  diamond  worth.  Next,  ratio  weights  were 
elicited  using  Edward's  SMART  procedure.  The  least  important  attribute 
(identified  from  the  rank-ordering)  was  assigned  a weight  of  10,  and 
weights  on  the  other  three  attributes  were  determined  by  the  subject. 

The  subject  was  Instructed  to  make  sure  that  the  ratio  of  any  pair  of 
importance  weights  reflected  the  number  of  times  more  important  one 
attribute  was  than  the  uther.  The  ratio  weights  were  simply  normalized 
to  sum  to  one. 


i 

1 


h 


1 1 

i 


i:< 


7 


Results 

Achievement.  The  correlation  between  a subject's  responses  and 
"true"  diamond  prices  (provided  in  outcome  feedback)  is  called 
"achievement"  (r.).  It  is  useful  to  examine  the  achievement  scores  as 
an  indication  of  the  extent  to  which  subjects'  knowledge  of  the  model, 
gained  through  outcome  feedback,  was  reflected  in  his/her  holistic 
evaluations.  Every  subject  improved  substantially  from  the  first  block 
of  50  trials  to  the  second  block  of  50  trials.  The  median  value  of  r 

“ U 

increased  from  .68  to  .76.  It  should  be  noted  that  had  a subject  simply 
responded  with  numbers  proportional  to  the  sum  of  the  four  attribute 
values  (equal  weighting),  a score  of  .73  would  have  resulted  for  r . 

u 

Also,  has  a subject  simply  responded  with  numbers  proportional  to  the 
value  of  the  diamond  on  the  most  important  attribute  (CARAT),  and  ignored 
the  other  three  attributes,  he/she  would  have  received  a score  of  .84  for 


Consistency  and  pseudo-consistency.  For  each  block  of  50  trials, 
a standard  multiple  regression  was  performed  on  each  subject's  holistic 
evaluations,  using  the  four  attributes  as  the  "predictor"  variables. 

The  regression  weights  derived  represent  estimates  of  subjects'  impor- 
tance weight  parameters,  as  was  discussed  earlier.  For  each  of  the  100 
stimulus  diamonds,  composite  estimates  of  worth  were  formed  by  applying 
the  subjects’  regression  weights,  ratio  weights,  and  rank  weights.  The 
consistency  index  (r^)  is  the  adjusted  correlation  between  the  composites 
forned  from  the  subjects'  regression  weight  model  and  the  holistic 
evaluations  of  the  subject.  (Wherry's  shrinkage  formula  was  applied  to 


mr  * 
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the  obtained  £ to  correct  for  the  usual  Inflated  multiple  correla- 
tions.) Pseudo-consistency  is  the  correlation  between  direct 
subjective  weight  models  (ratio  and  rank)  and  the  holistic  evaluations 
of  the  subject. 

There  were  three  important  results  regarding  consistency  and 
pseudo-consistency.  First,  the  models  derived  from  the  three  weight 
estimates  were  all  more  consistent  with  holistic  choices  over  the 
last  half  of  the  training  session  than  over  the  first  half.  This  in- 
crease was  especially  true  for  the  regression-weight  model,  where 
the  median  adjusted  changed  from  .68  to  .80.  The  effect  was  smaller 
for  the  two  subjective  weight  models:  the  median  for  the  ratio-weight 
model  increased  from  .65  to  .70,  and  the  median  for  the  rank-weight 
model  increased  from  .68  to  .72. 

Second,  the  consistency  scores  over  the  last  block  of  trials 
(median  ■ .80)  were  substantially  larger  than  the  pseudo-consistency 
scores  over  the  last  block  (ratio  median  = .70,  rank  median  = .72). 

This  result  is  in  part  explained  by  the  uniformly  low  pseudo-con- 
sistency scores  over  the  last  50  trials  by  Subjects  #7,  8,  and  9. 

Neither  the  ratio  nor  rank  weights  elicited  from  these  three  sub- 

I ■ 

jects  were  consistent  with  the  weighting  policy  used  in  making  the 

[ 

holistic  evaluations  during  the  last  50  trials. 

The  third  main  result  was  the  near  equivalence  of  pseudo-con- 
sistency scores  obtained  with  the  ratio-weight  model  and  with  the 
rank-weight  model.  Apparently,  the  subjects'  weighting  policy  used 
in  making  holistic  evaluations  is  as  well  described  by  their  sub- 
jective rankings  of  the  attributes  as  by  their  ratio  estimates  of 
attribute  Importance. 
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Criterion  validity:  Hatching  and  pseudo-matching.  To  assess 


the  criterion  validity  of  each  of  the  three  sets  of  weights,  composites 


formed  from  each  (the  same  as  those  discussed  In  the  previous  section 


under  convergence)  were  correlated  with  composites  formed  from  the 


"true  model"  weights,  determined  from  an  OLS  regression  analysis  of 


the  outcome  feedback  (given  in  Table  1).  These  correlations,  presented 


In  Table  2,  are  the  usual  "matching"  indices  used  in  MCPL  research. 


The  term  T>seudo-matching"  has  been  used  to  describe  the  correlations 


Involving  composites  formed  from  direct  subjective  weight  assessment 


(ratio  and  rank),  since  "matching"  is  traditionally  reserved  for  the 


model  derived  from  a regression  analysis  of  holistic  choices. 


In  general,  all  of  the  matching  and  pseudo-matching  indices  were 


quite  high:  models  derived  from  subjects'  judgments,  whether  holistic 


evaluations  (regression  weights)  or  direct  assessments  (ratio  and  rank 


weights),  were  In  good  agreement  with  the  "true"  multi-attribute  utility 


model.  Virtually  all  of  the  scores  are  greater  that  those  obtained 


with  either  of  the  simple  heuristic  models  ("Equal"  weights  median  = 


.76,  "Extreme"  weights  median  * .86).  There  Is  some  Indication  that 
the  subjects'  statistical  regression-weight  model  (median  = .97)  is 
better  than  the  ratio-and  rank-weight  models  (both  medians  = .94),  but 


these  differences  appear  slight.  Most  of  these  differences  can  be 


attributed  to  the  Inferiority  of  the  direct  assessments  of  Subjects  # 


7,  8,  and  9.  As  was  discussed  earlier,  the  ratio  and  rank  weights 


elicited  from  these  three  subjects  were  not  consistent  with  their 


holistic  choices. 


Discussion 


Experiment  I was  designed  to  test  the  validity  of  three  procedures 
for  assessing  Importance  weights  In  the  most  simple  mult Attribute 
situation  Imaginable:  four  uncorrelated  attributes  that  combine  to 
determine  virtually  all  of  the  variance  In  the  hypothetical  "overall 
utility”  of  the  object.  The  MCPL  paradigm  provides  a standard,  or 
"true"  multiattribute  utility  function  against  which  assessed  weight 
parameters  were  compared.  In  general,  results  indicate  that  all  three 
weighting  schemes  are  consistent  with  holistic  evaluations,  convergent 
with  one  another,  and  closely  match  the  "true"  MAU  model  taught  via 
outcome  feedback. 

An  idiographic  analysis  suggests  that  individual  differences  are 
present,  and  that  the  two  direct  methods  for  obtaining  importance 
weights  (ratio  and  rank),  were  not  valid  for  three  of  the  nine  subjects. 
The  relatively  high  level  of  achievement  obtained  by  these  three  sub- 
jects, as  well  as  their  high  consistency  and  matching  scores  for  the 
regression  weights  model,  suggest  that  they  did  learn  the  MAU  diamond 
model  given  in  Equation  1.  Apparently,  these  three  subjects  were 
either  unaware  of  their  learned  subjective  model  for  diamond  worth, 
or  did  not  understand  the  instructions  for  ratio-  and  rank-weight  assess- 
ment. Neither  of  these  alternative  explanations  is  palatabl  however. 
The  achievement,  consistency,  and  matching  indices  were  simply  too 
high  to  justify  unawareness,  and  there  is  not  much  to  misinterpret  in 
the  Instructions  to  "rank-order  the  attributes  from  most  important  to 
least  important." 

It  is  intriguing  to  note,  post  hoc,  that  the  three  subjects  in 
question  are  all  female;  thus,  although  all  five  male  subjects  gave 
valid  subjective  importance  weights,  only  one  out  of  four  females  did  so. 
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Interestingly,  all  three  ranked  CUT  as  the  most  important  attribute. 
Since  CUT  was  also  the  second  msot  Important  attribute  in  the  "true" 

MAU  model,  this  agreement  Is  ambiguous.  Subjects  #7,  8,  9 could  have 
been  expressing  a common  fact  about  real  diamonds,  or  they  could  have 
simply  "come  close"  In  their  direct  estimate  of  the  most  Important 
attribute.  Obviously,  more  data  are  required  before  these  sex- 
difference  speculations  can  be  resolved. 

Experiment  II 

Method 

Subjects.  Eight  undergraduate  students  at  the  University  of 
Southern  California  volunteered  for  the  experiment.  The  seven  males 
and  one  female  (chosen  without  knowledge  of  Experiment  I results) 
received  class  credits  to  fulfill  requirements  in  Introductory  Psycho- 
logy and  received  no  other  compensation.  Subjects  were  run  individually 
in  sessions  lasting  from  60  to  90  minutes. 

Training  procedures.  All  procedures  during  the  training  phase 
of  Experiment  II  (with  the  exception  of  the  composition  of  the 
programmed  MAU  model)  were  identical  to  those  in  the  first  experiment. 

MAU  models.  Two  different  additive  MAU  models,  each  utilizing 
the  four  "C"  attributes  from  the  first  experiment,  were  used  to  gener- 
ate the  diamond  profiles  and  corresponding  outcome  feedback.  Half  of 
the  subjects  saw  profiles  and  feedback  from  Model  "P",  which  involved  a 
rather  large  positive  correlation  between  COLOR  and  CLARITY;  the  other 
half  were  trained  on  Model  "N",  which  oriented  diamond  profiles  with 
rather  large  negative  correlations  between  COLOR  and  CLARITY  and 
between  CUT  and  CARAT. 

For  model  P,  three  of  the  four  attributes  (all  but  CLARITY)  were 
generated  independently  from  a uniform  density  function  with  endpoints 
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0 and  10.  Values  on  the  CLARITY  attribute  were  generated  as  a func- 
tion of  COLOR  and  normally  distributed  random  error  (CLARITY  * COLOR  + 
2-N(0,l)  ).  Instances  in  which  the  value  of  CLARITY  would  have  been 
negative  or  greater  than  ten  were  discarded,  and  an  entire  new  pro- 
file was  generated.  Thus,  the  expected  value  of  the  mean  rating  on 
all  four  attributes  is  5.  Had  no  profiles  been  discarded,  the  ex- 
pected value  of  the  variances  would  have  been  8.3  for  all  attributes 
except  CLARITY,  which  would  have  been  about  12.3.  Since  some  were 
discarded,  the  true  expected  variances  are  unknown.  The  expected 
value  of  the  attribute  intercorrelations  is  zero,  except,  of  course, 
for  that  between  COLOR  and  CLARITY.  Although  one  might  expect  the 
correlation  to  be  high,  the  exact  value  is  unknown,  since  the  cal- 
cualtion  involves  the  expected  value  of  the  attribute  variances.  As 
is  evident  in  the  top  portion  of  Table  3,  all  of  the  attribute  sample 
means  and  intercorrelations,  based  on  the  100  profiles,  are  close  to 
their  expected  values.  The  sample  intercorrelation  between  COLOR  and 
CLARITY  is  .86,  and  all  of  the  attribute  variances  are  reasonable. 


The  outcome  feedback  used  to  train  the  four  Model  P subjects  was  gener- 
ated from  the  following  model: 

TRUE  PRICE  = 0-CUT  + 60- COLOR  + 20’ CLARITY  + 40- CARAT  + 100  + 200-  N ( 0 , 1 ) (8) 
The  expected  value  of  the  mean  price  is  700.  Since  the  formulae  for  the 
expected  price  variance  and  the  multiple  correlation  both  require  the 
expected  values  of  the  attribute  variances,  their  values  are  unknown. 

The  smaple  values  of  the  price  mean  and  variances,  along  with  sample 
values  of  validity  coefficients  and  0LS  regression  weights,  are  given 


Attribute  Intercorrelations,  Validity 
Coefficients  and  Regression  Statistics 


at  the  top  of  Table  3.  The  square  root  of  the  sum  of  the  produces  of 
r and  0,  .82,  Is  the  sample  value  of  the  environmental  multiple  corre- 
lation, R . 

— e 

Because  of  the  high  multi -col linearity  between  COLOR  and  CLARITY, 
one  might  suspect  that  the  inverse  of  the  attribute  (predictor)  matrix 
is  ill-conditioned.  The  observation  of  an  eigenvalue  of  .13  provided 
confirmation.  With  small  eigenvalues  in  the  inverse  of  the  predictor 
matrix,  major  discrepancies  between  the  OLS  regression  weights  and 
the  "true"  population  weights  are  virtually  guaranteed.  Ridge  regres- 
sion was  applied  to  the  sample  attribute  intercorrelations  and  valid- 
ity coefficients  displayed  in  the  top  portion  of  Table  3.  A "ridge 
trace"  was  generated,  and  the  constant  value  (.2)  added  to  the  diagonal 
of  the  correlation  matrix  was  chosen  at  that  point  in  the  trace  where 
the  betas  seemed  to  stabilize.  The  ridge  regression  weights  are  also 
presented  at  the  top  of  Table  3.  They  yield  a sample  multiple  corre- 
lation of  .81,  only  slightly  less  than  that  for  OLS  weights.  As  can 
be  seen,  these  weights  are  strikingly  different  from  the  OLS  regression 
weights.  In  particular,  the  sign  of  the  CLARITY  weight,  negative  for 
the  OLS  analysis,  is  positive  for  the  ridge  analysis.  Also,  the  mag- 
nitude of  the  COLOR  weight  has  decreased  substantially  (from  .79  to 
.52).  In  general,  the  ridge  regression  weights  are  much  closer  to  the 
validity  coefficients  than  are  the  OLS  weights. 

An  analogous  procedure  was  followed  for  Model  N.  Here,  two  of 
the  attributes  (CUT  and  COLOR)  were  generated  independently  from  a 
uniform  density  function  with  endpoints  0 and  10.  Values  on  the 
CLARITY  attribute  were  generated  as  a function  of  COLOR  and  normally 
distributed  random  error  (CLARITY  = 10  - COLOR  + N(0,1)  );  CARAT  was 
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generated  from  CUT  and  normally  distributed  random  error  (CARAT  = 

10  - CUT  + 2-N(o,l)  ).  As  for  Model  P,  any  profile  with  a value  on 
CLARITY  or  CARAT  outside  the  0 to  10  range  was  discarded  and  a new 
profile  was  generated.  Thus,  the  expected  value  of  the  mean  rating 
on  all  four  attributes  is  5,  and  the  expected  values  of  the  vari- 
ances are  again  unknown,  due  to  the  discarding  of  some  generated 
profiles.  Had  no  profiles  been  discarded,  the  expected  variances 
would  have  been  about  8.3  for  CUT  and  COLOR,  about  9.3  for  CLARITY, 
and  about  12.3  for  CARAT.  The  expected  value  of  the  attribute 
intercorrelations  is  zero,  except  for  that  between  CUT  and  CARAT  and 
between  COLOR  and  CLARITY.  Although  these  two  correlations  are 
expected  to  be  negative,  calculation  of  their  exact  value  requires 
the  expected  values  of  the  attribute  variances,  which  are  unknown. 

The  sample  attribute  means  and  intercorrelations,  presented  in  the 
bottom  portion  of  Table  3,  are  all  close  to  their  expected  values. 

The  sample  intercorrelation  between  COLOR  and  CLARITY  is  -.95,  and 
that  between  CUT  and  CARAT  is  -.74.  Overall,  the  sample  attribute 
variances  are  lower  than  those  for  Model  £_. 

The  outcome  feedback  was  generated  from  the  following  model: 

TRUE  PRICE  = 30 -CUT  + 80 -COLOR  + 10 -CLARITY  + 60 -CARAT  + 300  + 150 -N(0,1 ) (9) 

The  expected  value  of  the  mean  price  is  1200,  and  the  expected  vari- 
ance and  expected  multiple  correlation  are  both  unknown,  since  they 
depend  upon  the  unknown  expected  attribute  variances.  The  sample 
price  mean  and  variance,  the  validity  coefficients,  and  the  0LS  re- 
gression weights  are  given  at  the  bottom  of  Table  3.  The  model 
multiple  correlation,  j^,  is  .84  ( = J zr  • B ) • 
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Inspection  of  the  eigenvalues  of  the  inverse  of  the  attribute 
(predictor)  matrix  yielded  strong  evidence  for  ill-conditioning  and 


OLS  mis-estimation,  the  smallest  eigenvalue  being  less  than  .05. 

A ridge  regression  analysis  was  applied  to  the  attribute  intercorre- 
lations and  validity  coefficients  displayed  at  the  bottom  of  Table  3. 
Again,  the  critical  constant  (.2)  added  to  the  diagonal  of  the  inter- 
correlation matrix  of  attributes  was  determined  from  an  inspection  of 
the  "ridge  trace."  The  ridge  weights,  given  at  the  bottom  of  Table  3, 
yield  a multiple  correlation  of  .83  (very  close  to  the  .84  value  for 
the  OLS  weights).  Although  the  ridge  weights  are  ordinally  equivalent 
to  the  OLS  weights,  they  are  different  in  sign  on  two  attributes.  The 
ridge  analysis  suggests  that  CLARITY  should  have  a negative  orientation 
to  overall  Price,  consistent  with  the  validity  coefficient.  In  general, 
the  ridge  weights  are  closer  to  the  validity  coefficients  than  are  the 
OLS  regression  weights,  just  as  was  the  case  for  Model  P. 

Direct  subjective  weight  assessment.  As  in  the  first  experiment, 
subjects  were  led  into  an  adjoining  room,  and  rank  and  ratio  weights 
were  assessed.  Two  additional  procedures  were  employed  after  ratio- 
weight  assessment:  "pricing  out"  and  "trading  off  to  the  most  im- 
portant dimension"  (Keeney  and  Raiffa,  1976).  For  the  trade-off  pro- 
cedure, subjects  essentially  specify  the  change  on  the  most  important 
dimension  that  is  equivalent  to  a standard  change  on  each  of  the  other 
three  dimensions.  For  the  pricing-out  method,  subjects  must  specify 
an  amount  of  money  that  is  equivalent  to  a standard  change  on  each  of 
the  four  attributes.  For  all  four  assessment  techniques,  subjects  were 
forced  to  be  consistent  about  their  implied  attribute  rankings.  The 
reasoning  behind  all  inconsistencies  was  explained.  Notwithstanding 
possible  Rosenthal  effects,  all  subjects  expressed  a desire  to  change 
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their  responses  to  alleviate  the  problem.  Only  three  instances  of 
inconsistency,  all  minor  (weights  were  very  close  in  magnitude), 
were  observed. 

Results 

Achievement.  Achievement  scores  (correlations  between  subjects' 
holistic  responses  and  outcome  feedback)  for  each  of  the  two  blocks 
of  fifty  trials  were  calculated.  As  in  the  first  experiment,  r showed 

U 

a consistent  increase  (median  increased  from  .62  to  .73)  for  all  four 

Model  P subjects  (P-1,  P-2,  P-3,  P-4).  However,  the  four  Model  N 

subjects  (N-l,  N-2,  N-3,  N-4)  showed  no  stable  pattern  for  r scores. 

-a 

Although  two  of  the  subjects'  r . scores  remained  about  the  same  (N-2 

a 

and  N-4),  Subject  N-l  showed  a drastic  decrease  (from  .39  to  .06)  while 
Subject  N-3  increased  substantially  (from  .38  to  .69). 

Just  as  in  Experiment  I,  Model  P subjects'  unaided  holistic  eval- 
uation were  no  better  than  two  simple  heuristic  strategies,  equal 
weighting  and  extreme  weighting.  For  Model  P,  the  equal -weighting 
model  correlated  .74  with  the  outcome  feedback,  and  the  extreme-weighting 
model  (attending  only  to  COLOR)  correlated  .76.  The  Model  N subjects, 
however,  performed  substantially  worse  than  the  extreme-weighting 
model  (COLOR  only).  Although  the  equal-weighting  model  only  correlated 
.28  with  the  Model  N feedback,  one  subject  performed  even  worse  during 
the  fifty-trial  block.  Given  the  somewhat  lower  predictability  of  the 
Model  P feedback  as  compared  to  that  for  Model  N (see  Table  3),  and 
their  near  equivalence  in  predictability  for  the  simple  extreme-weight- 
ing heuristic,  the  clear  differences  in  achievement  between  Model  P 
and  Model  N subjects  are  surprising. 
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Consistency  and  pseudo-consistency.  As  in  the  first  experiment, 
consistency  studies  (adjusted  multiple  correlation  from  OLS  regression 
analysis  on  subjects'  holistic  responses)  were  obtained  over  each 
trial  block.  In  addition,  consistency  scores  were  computed  using 
ridge  weights  (constant  added  to  diagonal  was  .2  for  Model  P and  .3 
for  Model  N)  instead  of  the  OLS  weights.  Pseudo-consistency  scores 
were  computed  for  ratio,  rank,  price-out,  and  trade-off  weights  over 
both  trial  blocks.  Both  "OLS-ratio"  and  "ridge-ratio"  weights  were 
obtained  via  regression  analysis  using  the  elicited  ratio  weights 
as  estimates  of  the  validity  coefficients,  and  pseudo-consistency 
scores  were  computed  for  these  two  weighting  schemes  over  both  trial 
blocks.  (For  ridge-ratio  weights,  the  constant  added  to  the  diagonal 
was  .2  for  Model  P and  .4  for  Model  N. ) The  obtained  consistency 
and  pseudo-consistency  scores  are  measures  of  the  degree  to  which 
the  various  weighting  schemes  yielded  composites  consistent  with 
subjects'  holistic  choices. 

Several  important  results  are  evident  here.  First,  the  various 
MAU  models  elicited  from  Model  P subjects  are  much  more  consistent 
with  their  holistic  responses  from  the  last  block  of  trials  than  from 
the  first  block.  For  Model  P,  the  median  correlations  range  from 
.78  to  .88  for  the  second  block,  compared  to  the  .73  to  .81  median 
range  for  the  first  block.  A different  pattern  emerged  for  Model  N 
subjects,  who  showed  no  differences  in  consistency  from  the  first  trial 
block  to  the  second. 

A second  main  result  is  the  rather  substantial  difference  between 
the  two  models  (N  and  £)  in  the  overall  levels  of  consistency.  The 
maximum  median  consistency  (or  pseudo-consistency)  score  reported  for 
Model  N,  over  both  trial  blocks  and  all  eight  sets  of  obtained  weight 
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estimates  is  .63.  In  contrast.  Model  P median  consistency  (or  pseudo- 
consistency) scores  are  in  the  70' s for  the  first  trial  block  and  in 
the  80' s for  the  second.  The  lower  consistency  scores  for  Model  N 
subjects  indicates  that  their  holistic  responses  were  less  predictable 
from  the  four  attributes  than  were  those  for  Model  P subjects.  The 
lower  pseudo-consistency  scores  for  Model  N indicates  that  Model  P 
subjects  were  better  able  to  describe  the  weighting  policy  they 
actually  used  in  generating  their  holistic  estimates. 

Perhaps  the  most  striking  result  is  tne  near  equivalence  among 
all  of  the  consistency  and  pseudo-consistency  indices  for  each  parti- 
cular trial  block  and  model.  Other  than  the  marked  failure  of  the 
hybrid  OLS-ratio  technique  for  Model  N,  all  of  the  assessed  (or  derived) 
weighting  schemes  predicted  subjects'  holistic  responses  equally  well; 
little  or  no  consistent  pattern  emerged  from  the  data.  Although  the 
OLS  and  ridge  composites  tended  to  be  in  closer  correspondence  to 
holistic  responses  than  composites  from  either  the  direct  subjective 
assessments  (ratio  and  rank)  or  the  indifference  assessments  (price-out 
and  trade-off),  the  differences  appear  very  slight.  While  ridge  and 
OLS  weights  are  essentially  equivalent  in  power  to  predict  holistic 
responses,  the  ridge-ratio  weights  are  substantially  more  predictive 
than  the  OLS-ratio  weights.  Thus,  when  the  attribute  validity  coef- 
ficients were  derived  from  subjects'  holistic  evaluations,  little 
difference  between  the  ridge  and  OLS  weights  emerged.  However,  when 
the  validity  coefficients  were  estimated  directly  (from  the  subjects' 
ratio-weight  assessments),  the  ridge  analysis  yielded  weights  strikingly 
more  predictive  of  holistic  responses. 


Criterion  validity  - matching  and  pseudo-matching.  The  criterion 
validity  of  each  of  the  eight  sets  of  subjects'  weights  was  assessed 
by  computing  matching  indices  for  OLS  and  ridge  weights  and  pseudo- 
matching indices  for  the  remaining  weighting  schemes.  Both  the  "true 
OLS"  and  "true  ridge"  weights,  presented  in  Table  3 for  models  P and 
N,  were  used  as  criterion  models.  The  matching  and  pseudo-matching 
correlations  are  presented  in  Table  4. 
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Overall,  the  matching  and  pseudo-matching  measures  are  quite  high 
for  Model  P subjects.  Median  correlations  range  from  .94  to  1.00 
for  Model  P subjects,  indicating  that  the  assessed  (derived)  weight 
composites  agree  with  both  sets  of  true  weight  composites.  For  Model 
P,  there  is  no  evidence  of  differences  among  the  eight  sets  of  assessed 
weights  or  between  the  two  sets  of  "true  weights."  Although  the  two 
hybrid  weight  composites  diverge  somewhat  from  the  "true  weight"  com- 
posites for  subject  P-1,  all  four  Model  P subjects  display  the  same 
general  pattern  of  extremely  high  matching  and  pseudo-matching. 

The  pattern  of  results  is  more  complicated  for  the  four  Model  N 
subjects.  Although  the  correspondence  is  lower,  in  general,  than  that 
for  Model  P,  there  are  obvious  individual  differences.  In  comparing 
the  subjects'  ridge  weights  to  their  OLS  weights,  all  four  subjects  show 
better  pseudo-matching  for  their  ridge  weights  when  the  validity  coef- 
ficients are  directly  estimated  from  the  ratio  weight  assessment.  Two 
of  the  subjects  (N-l  and  N-2)  show  a superiority  for  ridge  weights  when 
the  validity  coefficients  are  estimated  from  the  subjects'  holistic 
evaluations.  As  for  Model  £,  there  is  no  consistent  pattern  in  the 
pseudo-matching  scores  for  the  four  post-training  sets  of  weights.  For 
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Matching  (r  ) and  Pseudo-Matching  Scores 


Matching  (r  ) and  Pseudo-Matching  Scores 


subjects  N-l  and  N-2,  all  four  post-training  weighting  schetnes  corre- 
sponded more  highly  to  the  criterion  than  did  any  of  the  statistical 
weights.  For  subject  N-4,  the  statistical  weights  were  better  than 
the  post-training  weights.  There  were  no  differences  for  subject  N-3, 
who  showed  the  best  matching  and  pseudo-matching  of  all  the  Model  N 
subjects  across  all  eight  sets  of  assessed  weights.  For  all  four 
Model  N subjects,  the  hybrid  weights  (ridge-ratio  and  OLS-ratio) 
composites  demonstrated  the  largest  divergence  from  the  "true"  cri- 
terion weight  composites. 

The  pseudo-matching  baseline  correlatins  for  equal  and  extreme 
weighting,  given  at  the  bottom  of  Table  4,  Indicate  that  the  Model  P 
subjects'  elicited  weights  were  an  Improvement  over  either  of  the  two 
heuristic  weighting  schemes.  Although  the  equal  weighting  scheme  for 
Model  N is  rather  poor,  the  extreme  weighting  heuristic  provides  as 
high  a pseudo-matching  score  as  any  of  the  Model  N subjects,  with  the 
notable  exception  of  subject  N-3. 

Discussion 

Experiment  II  was  designed  to  test  the  validity  of  eight  pro- 
cedures for  assessing  importance  weights  in  a more  complicated  multi - 
attribute  situation  than  that  of  the  first  experiment.  The  construct 
of  "overall  diamond  value"  was  less  predictable  from  the  four  attri- 
butes provided  (r^  = .82  and  .83  for  Models  P and  N,  respectively). 

Also,  the  set  of  alternatives,  S,  was  constructed  so  as  to  present  large 
intercorrelations  among  the  four  diamond  attributes,  (one  large  posi- 
tive correlation  for  Model  P,  and  two  large  negative  correlations  for 
Model  N).  The  study  Is  the  first  attempt  to  test  the  criterion 
validity  of  subjects'  ridge  weights,  hybrid  weights  (suggested  by 
Newman,  Note  2),  and  Indifference  weights. 
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For  Model  P,  all  four  subjects  learned  the  diamond  model  well 
(high  r ),  and  provided  weights  consistent  with  holistic  evaluations 
(high  j^),  with  one  another  (high  convergence),  and  with  the  "true" 
MAU  model  taught  (high  r^).  Virtually  no  differences  were  observed 
among  the  eight  sets  of  subjects'  weights,  in  terms  of  the  composites 
derived  from  them.  Thus,  the  results  of  Model  P subjects  indicated 
that  the  criterion  validity  evidenced  in  Experiment  I also  holds  in 
a context  in  which  the  weights  are  less  explicit  (lower  ^ and  non- 
zero attribute  correlations)  and  over  a broad  range  of  weight  assess- 
ment approaches.  The  success  of  the  novel  ridge  and  hybrid  tech- 
niques is  especially  important. 

The  results  for  Model  N are  not  consistent  with  those  for  Model 
P and  Experiment  I.  One  of  the  problems  was  that  two  of  the  four 
subjects  did  not  learn  the  MAU  appraisal  model  very  well,  as  was 
evidenced  by  the  low  r scores.  For  the  two  subjects  who  did  learn 
the  model,  weights  were  obtained  from  one  subject  which  were  highly 
valid,  but  the  non-statistical  weights  obtained  from  the  other  subject 
yielded  composites  highly  discrepant  from  those  of  the  "true"  diamond 
model.  Although  the  weights  obtained  from  the  other  two  subjects 
were  uniformly  poor,  there  was  a substantial  superiority  evidenced 
for  the  post-training  assessments  over  the  statistical  and  hybrid 
approaches. 

The  most  surprising  result  is  the  extreme  difference  in  Model 
JP  and  Model  N subjects'  performance.  The  only  difference  between 
the  two  models  is  reflected  in  the  sample  intercorrelation  matrix 
of  the  four  attributes.  Thus,  subjects'  ability  to  learn  a MAU 
model  (i.e.,  the  relationship  between  attributes  and  an  overall  cri- 
terion construct  of  value)  is  greatly  dependent  upon  the  environmental 


relationships  among  the  salient  attributes.  Since  only  two 
subjects  obtained  satisfactory  achievement  scores,  the  results 
comparing  assessment  techniques  for  Model  N are  Inconclusive. 


Conclusions  and  General  Discussion 


Two  experiments  were  conducted  to  assess  the  validity  of  sev- 
eral weight  assessment  techniques.  In  the  first,  a four-attribute 
MAU  model  with  zero  environmental  correlations  among  attributes 
was  taught  to  nine  subjects.  The  regression-,  rank-,  and  ratio-weight 
estimates  all  resulted  in  composites  which  closely  matched  those  of 
the  true  model;  most  subjects'  weighting  schemes  were  a great  im- 
provement over  either  equal  or  extreme  weighting.  For  three  of  the 
nine  subjects,  the  rank  and  ratio  assessments  produced  lower  matching 
than  did  the  regression-weight  estimates. 

In  the  second  experiment,  a total  of  eight  subjects  were  taught 
one  of  two  four-attribute  MAU  models,  each  involving  substantial 
attribute  intercorrelations.  Both  of  these  models  were  less  explicit 
(more  error  variance)  than  the  one  taught  in  Experiment  I.  A total  of 
eight  methods  were  employed  in  assessing  subjects'  importance  weights: 
OLS  and  ridge  regression  on  holistic  choices,  OLS  and  ridge  regression 
using  ratio-weight  estimates  as  validity  coefficient  estimates,  direct 
subjective  ranking  and  ratio  estimation,  and  the  two  indifference  tech- 
niques of  pricing-out  and  trading-off  to  the  most  important  dimension. 
For  the  model  involving  one  large  positive  correlation  between  two 
of  the  attributes,  all  eight  weight  assessment  methodologies  produced 
equally  good  composites;  all  composites  derived  from  subjects'  weights 
correponded  to  the  "true"  model  composites  better  than  simple  heuristic 
rules  such  as  equal  weighting  and  extreme  weighting.  For  the  model 
Involving  two  rather  large  negative  intercorrelations  among  attributes, 
the  results  are  inconclusive.  Although  the  statistical  weights  were 
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superior  for  one  subject  who  seemed  to  have  learned  the  model  well, 
the  direct  assessment  and  indifference  weights  were  superior  for  two 
of  the  subjects  who  did  not  learn  the  model  so  well.  Only  one  sub- 
ject produced  valid  weights  across  all  eight  assessment  techniques. 

The  present  research  and  findings  are  interesting  from  both  an 
applied  and  theoretical  perspective.  For  the  applied  decision  ana- 
lyst (or  judgement  analyst),  the  work  by  Schmitt  (1978)  and  that 
presently  reported  contribute  strong  evidence  to  the  assertion  that 
the  additive  MAU  model  is  a valid  prescriptive  tool.  The  evidence 
that  people  can  indeed  provide  direct  subjective  estimates  of  im- 
portance weighting  is  important.  In  most  interesting  decision  prob- 
lems, such  as  choosing  a school  desegregation  plan  or  siting  a nu- 
clear power  plant,  a large  alternative  set  is  not  readily  known 
a priori.  In  such  applied  situations,  the  feasibility  of  most  in- 
direct holistic  approaches  to  deriving  importance  weights  is  in 
doubt.  Even  if  a reasonably  large  set  of  alternatives  could  be 
generated,  in  most  cases  the  number  of  dimensions  involved  makes 
the  task  of  holistic  evaluation  of  alternatives  extremely  difficult, 
if  not  impossible. 

The  applied  decision  problem  of  Edwards  (Note  3)  is  a good 
example.  The  decision-makers  — the  board  members  of  the  Los  Angeles 
Unified  School  District  --  were  faced  with  a MAU  problem  of  seven 
alternatives.  Each  alternative  was  a detailed  (or  not  so  detailed) 
plan  for  desegregating  the  Los  Angeles  school  system.  In  the  final 
decision  tree  developed  by  Edwards,  these  plans  were  defined  on  144 
dimensions  of  Importance.  Any  approach  to  defining  the  importance 
weights  that  depended  upon  holistic  assessments  of  these  few 


alternative  plans,  defined  on  so  many  dimensions,  would  have  been 
hopelessly  inadequate.  Ratio  weighting,  the  assessment  technique 
actually  applied,  was  much  more  reasonable.  The  board  members  fo^nd 
the  task  of  assigning  ratio  weights  not  only  possible,  but  somewhat 
therapeutic.  That  is,  Edwards'  ratio-weight  procedure  forced  them  to 
think  hard  about  their  values  and  how  they  related  to  the  overall 
utility  of  various  desegregation  plans. 

Most  applied  decision  analysts  would  like  to  think  of  themselves 
as  more  than  therapists,  however.  The  overwhelming  belief  among 
most  decision  analysts  is  that  their  methods  elicit  parameter  estimates 
of  preference  models  that  result  in  a normative  choice  structure. 

That  is,  decision  analysts  believe  that  their  clients  should  behave 
in  the  manner  suggested  as  optimal  by  the  elicited  choice  structure. 
Although  the  stimuli  used  in  the  present  study  (diamonds  defined  on 
four  dimensions)  and  in  Schmitt's  (1978)  study  (graduate  applicants 
defined  on  four  attributes)  are  simplistic,  and  the  acquisition  of 
information  about  attribute  importance  is  contrived  (feedback 
learning),  the  results  suggest  that  attribute  importance  is  a valid 
psychological  construct.  That  people  can  make  accurate  estimates  of 
importance  weights  in  the  laboratory  setting  is  certainly  a necessary 
condition  for  their  being  able  to  do  so  in  the  more  complex  and  emo- 
tional settings  usually  faced  by  a decision  anlayst  and  his/her  clients. 

From  a theoretical  perspective,  the  present  study  and  that  re- 
ported by  Schmitt  (1978)  are  highly  relevant  in  the  current  debate 
over  the  extent  to  which  people  are  aware  of  their  own  cognitive 
processes.  In  a recent  article  on  the  topic  of  verbal  reports  of 
mental  process,  Nisbett  and  Wilson  (1977)  sunmarize  the  Slovic  and 
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and  Lichtenstein  conclusions  on  subjective  weighting  as  a "fair 
assessment  of  this  literature."  In  their  review,  however,  Nisbett 
and  Wilson  used  the  term  "impressive"  in  describing  the  "evidence 
of  at  least  some  correspondence  between  subjective  and  objective 
weights  (p.  254)."  Of  course,  Nisbett  and  Wilson's  perception  of 
the  subjective  weighting  literature  (based  on  the  conclusions  of 
Slovic  and  Lichtenstein,  1971)  is  out  of  date.  Given  the  present 
data  and  the  review  by  John  and  Edwards  (Note  1),  there  is  little 
reasonable  justification  for  the  claim  that  subjects  cannot  directly 
report  beliefs  about  attribute  importance. 

From  the  perspective  of  Nisbett  and  Wilson,  however,  even  the 
mostly  negative  conclusions  of  Slovic  and  Lichtenstein  (1971)  had 
to  be  reconciled  with  an  overwhelming  literature  that  people  are 
total 1y  incapable  of  introspection  about  cognitive  processes.  (For 
a rebuttal  to  the  Nisbett  and  Wilson  conclusions  concerning  self- 
insight and  awareness  in  general,  see  Smith  and  Miller,  1978.) 
Nisbett  and  Wilson  (1977)  assert  the  following: 

It  seems  likely,  in  fact,  that  clinicians  and  stock- 
brokers could  assign  accurate  weights  prior  to  making 
the  series  of  judgments  in  these  experiments  simply 
by  calling  on  the  stored  rules  about  what  such  judg- 
ments should  reflect.  If  so,  one  would  scarcely 
want  to  say  they  were  engaging  in  prospective  intro- 
specition,  but  merely  that  they  remember  well  the 
formal  rules  of  diagnosis  or  financial  counseling 
they  were  taught  (p.  254). 
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The  results  from  the  current  study,  and  Schmitt's  (1978)  study 
challenge  NIsbett  and  Wilson's  speculations.  With  only  minor 
exceptions,  subjects  were  able  to  provide  importance  weights 
predictive  of  their  own  holistic  evaluations  in  an  experimental 
setting  for  which  there  were  no  stored  "rules"  for  determining 
judgments.  The  diamond  appraisal  policies  in  the  present  study 
were  learned  indirectly,  without  the  intervention  of  verbal  de- 
scriptions or  formal  linguistic  rules.  Subjects  demonstrated  an 
awareness  of  both  their  own  rules  for  making  diamond  appraisals, 
and  the  criterion  diamond  model  used  to  generate  the  outcome 
feedback. 

The  present  study  suggests  than  an  important  future  variable 
in  research  on  importance  weighting  is  the  intercorrelation  matrix 

. of  attributes.  Although  the  "true"  criterion  model  is  more  diffi- 

cult to  determine  when  attributes  are  intercorrelated,  the  application 

4 

of  biased  regression  techniques  makes  the  task  a manageable  one. 

The  results  of  the  present  study  were  moderately  encourageing  for 
the  novel  hybrid  weighting  approach  suggested  by  Newman  (Note  2); 
further  research  is  needed,  however. 

A possibly  important  intervening  variable  in  the  assessment  of 
importance  weights  is  the  amount  of  exposure  subjects  have  to  the 
"true"  MAU  model.  Also,  the  explicitness  of  the  MAU  model  is  another 
potential  Intervening  variable.  If  the  overall  utility  of  the 
stimuli  are  not  predicted  well  by  the  attributes  considered  (high 
error  variance),  subjective  weights  may  not  be  so  accurate.  The 
amount  of  experience  (number  of  learning  trials)  of  the  decision-maker, 
and  the  strength  of  the  relationship  between  the  MAU  model  attributes 
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and  the  construct  of  overall  utility  (R^),  are  concrete  variables, 
often  highly  descriptive  of  specific  applied  settings.  The  first 
variable  relates  to  the  notion  of  decision-maker  expertise,  while 
the  second  Is  a function  of  the  defining  characteristics  of  the 
decision  problem.  Future  research  on  Importance  weighting  should 
systematically  explore  the  effects  of  the  number  of  trials  of 


feedback  learning  and  ^ on  subjective  estimates  of  attribute 


Importance.  The  problem  of  group  assessment  of  importance  weights 
Is  yet  an  additional  topic  for  future  research  that  has  heretofore 
received  little  or  no  attention. 
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